SPsort: How to Sort a Terabyte Quickly
نویسنده
چکیده
In December 1998, a 488 node IBM RS/6000 SP sorted a terabyte of data (10 billion 100 byte records) in 17 minutes, 37 seconds. This is more than 2.5 times faster than the previous record for a problem of this magnitude. The SPsort program itself was custom-designed for this benchmark, but the cluster, its interconnection hardware, disk subsystem, operating system, file system, communication library, and job management software are all IBM products. The system sustained an aggregate data rate of 2.8 GB/s from more than 6 TB of disks managed by the GPFS global shared file system during the sort. Simultaneous with these transfers, 1.9 GB/s of local disk I/O and 5.6 GB/s of interprocessor communication were also sustained.
منابع مشابه
High-speed parallel external sorting of data with arbitrary distribution
Many parallel sorting algorithms of (external) disk data have been reported such as NOWsort, SPsort, and hill sort, etc. They all reduce the execution time compared to some known sequential sort; however, they differ in terms of the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. If we divide and redistribute data to pro...
متن کاملDistribution-Insensitive Parallel External Sorting on PC Clusters
There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and costeffectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data w...
متن کاملThe Record-Breaking Terabyte Sort on a Compaq Cluster
Sandia National Laboratories (U.S. Department of Energy) and Compaq Computer Corporation built a 72-node Windows NT cluster, which Sandia utilizes for production work contracted by the U.S. government. Recently, Sandia and Compaq's Tandem Division collaborated on a project to run a 1-terabyte commercial-quality scalable sort on this cluster. The audited result was a new world record of 46.9 min...
متن کاملDublin City University at the TREC 2006 Terabyte Track
For the 2006 Terabyte track in TREC, Dublin City University’s participation was focussed on the ad hoc search task. As per the pervious two years [7, 4], our experiments on the Terabyte track have concentrated on the evaluation of a sorted inverted index, the aim of which is to sort the postings within each posting list in such a way, that allows only a limited number of postings to be processe...
متن کاملThe Information Lifecycle
Pankaj Mehra is a Distinguished Technologist at HP where he serves as Chief Scientist of HP Labs Russia. Prior to joining HP Labs in 2005, Pankaj was: the architect of HP’s Integrated Archive Platform and NonStop Advanced Architecture; Chair of InfiniBand Management WG; CTO of IntelliFabric; and the designer of clusters that held TPC-C and Terabyte sort benchmark records. Pankaj has written 3 b...
متن کامل